NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Offline Multitask Representation Learning for Reinforcement Learning

Ishfaq, Haque; Nguyen-Tang, Thanh; Feng, Songtao; Arora, Raman; Wang, Mengdi; Yin, Ming; Precup, Doina (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is tasked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore, we examine downstream RL in reward-free, offline and online scenarios, where a new task is introduced to the agent that shares the same representation as the upstream offline tasks. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
more » « less
Full Text Available
Improving sample efficiency of model-free algorithms for zero-sum Markov games

Feng, Songtao; Yin, Ming; Wang, Yu-Xiang; Yang, Jing; Liang, Yingbin (July 2024, Proc. International Conference on Machine Learning (ICML))

Full Text Available
Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Feng, Songtao; Yin, Ming; Wang, Yu-Xiang; Yang, Jing; Liang, Yingbin (July 2024, Proceedings of Machine Learning Research)

The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an ϵ-optimal Nash Equilibrium (NE) with the sample complexity of O(H3SAB/ϵ2), which is optimal in the dependence of the horizon H and the number of states S (where A and B denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the H dependence as model-based algorithms. The main improvement of the dependency on H arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.
more » « less
Full Text Available
Towards General Function Approximation in Nonstationary Reinforcement Learning

Feng, Songtao; Yin, Min; Wang, Yu-Xiang; Yang, Jing; Liang, Yingbin (July 2024, Proc. IEEE International Symposium on Information Theory (ISIT))

Full Text Available
Towards General Function Approximation in Nonstationary Reinforcement Learning

Feng, Songtao; Yin, Min; Wang, Yu-Xiang; Yang, Jing; Liang, Yingbin (March 2024, IEEE journal on selected areas in information theory)

Full Text Available
Toward General Function Approximation in Nonstationary Reinforcement Learning

https://doi.org/10.1109/JSAIT.2024.3381818

Feng, Songtao; Yin, Ming; Huang, Ruiquan; Wang, Yu-Xiang; Yang, Jing; Liang, Yingbin (January 2024, IEEE Journal on Selected Areas in Information Theory)

Full Text Available
Non-stationary Reinforcement Learning under General Function Approximation

Feng, Songtao and; Yin, Ming; Huang, Ruiquan; Wang, Yu-Xiang; Yang, Jing; Liang, Yingbin (July 2023, Proceedings of Machine Learning Research)

General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.
more » « less
Precoding and Scheduling for AoI Minimization in MIMO Broadcast Channels

https://doi.org/10.1109/TIT.2022.3167618

Feng, Songtao; Yang, Jing (August 2022, IEEE Transactions on Information Theory)

Full Text Available
Non-stationary reinforcement learning under general function approximation

Feng, Songtao; Yin, Ming; Huang, Ruiquan; Wang, Yu-Xiang; Yang, Jing; Liang, Yingbin. (January 2023, Proc. International Conference on Machine Learning (ICML))

Full Text Available

Search for: All records